NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MambaByte: Token-free Selective State Space Model

Wang, Junxiong; Gangavarapu, Tushaar; Yan, Jing Nathan; Rush, Alexander M (October 2024, COLM)

Token-free language models learn directly from raw bytes and remove the inductive bias of subword tokenization. Operating on bytes, however, results in significantly longer sequences. In this setting, standard autoregressive Transformers scale poorly as the effective memory required grows with sequence length. The recent development of the Mamba state space model (SSM) offers an appealing alternative approach with a fixed-sized memory state and efficient decoding. We propose MambaByte, a token-free adaptation of the Mamba SSM trained autoregressively on byte sequences. In terms of modeling, we show MambaByte to be competitive with, and even to outperform, state-of-the-art subword Transformers on language modeling tasks while maintaining the benefits of token-free language models, such as robustness to noise. In terms of efficiency, we develop an adaptation of speculative decoding with tokenized drafting and byte-level verification. This results in a 2.6× inference speedup to the standard MambaByte implementation, showing similar decoding efficiency as the subword Mamba. These findings establish the viability of SSMs in enabling token-free language modeling.
more » « less
Full Text Available
Overfill: Two-Stage Models for Efficient Language Model Decoding

Kim, Woojeong; Wang, Junxiong; Yan, Jing Nathan; Abdelfattah, Mohamed S; Rush, Alexander M (October 2024, Openreview)

Full Text Available
Diffusion Models Without Attention

https://doi.org/10.1109/CVPR52733.2024.00787

Yan, Jing Nathan; Gu, Jiatao; Rush, Alexander M (June 2024, IEEE)

Full Text Available
Predicting Text Preference Via Structured Comparative Reasoning

https://doi.org/10.18653/v1/2024.acl-long.541

Yan, Jing Nathan; Liu, Tianqi; Chiu, Justin; Shen, Jiaming; Qin, Zhen; Yu, Yue; Lakshmanan, Charumathi; Kurzion, Yair; Rush, Alexander; Liu, Jialu; et al (January 2024, Association for Computational Linguistics)

Full Text Available
Tessera: Discretizing Data Analysis Workflows on a Task Level

https://doi.org/10.1145/3411764.3445728

Yan, Jing Nathan; Gu, Ziwei; Rzeszotarski, Jeffrey M (May 2021, CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems)

Researchers have investigated a number of strategies for capturing and analyzing data analyst event logs in order to design better tools, identify failure points, and guide users. However, this remains challenging because individual- and session-level behavioral differences lead to an explosion of complexity and there are few guarantees that log observations map to user cognition. In this paper we introduce a technique for segmenting sequential analyst event logs which combines data, interaction, and user features in order to create discrete blocks of goal-directed activity. Using measures of inter-dependency and comparisons between analysis states, these blocks identify patterns in interaction logs coupled with the current view that users are examining. Through an analysis of publicly available data and data from a lab study across a variety of analysis tasks, we validate that our segmentation approach aligns with users’ changing goals and tasks. Finally, we identify several downstream applications for our approach.
more » « less
Full Text Available
Understanding User Sensemaking in Machine Learning Fairness Assessment Systems

https://doi.org/10.1145/3442381.3450092

Gu, Ziwei; Yan, Jing Nathan; Rzeszotarski, Jeffrey M. (April 2021, WWW '21: Proceedings of the Web Conference 2021)

A variety of systems have been proposed to assist users in detecting machine learning (ML) fairness issues. These systems approach bias reduction from a number of perspectives, including recommender systems, exploratory tools, and dashboards. In this paper, we seek to inform the design of these systems by examining how individuals make sense of fairness issues as they use different de-biasing affordances. In particular, we consider the tension between de-biasing recommendations which are quick but may lack nuance and ”what-if” style exploration which is time consuming but may lead to deeper understanding and transferable insights. Using logs, think-aloud data, and semi-structured interviews we find that exploratory systems promote a rich pattern of hypothesis generation and testing, while recommendations deliver quick answers which satisfy participants at the cost of reduced information exposure. We highlight design requirements and trade-offs in the design of ML fairness systems to promote accurate and explainable assessments.
more » « less
Full Text Available
DataPrep.EDA: Task-Centric Exploratory Data Analysis for Statistical Modeling in Python

https://doi.org/10.1145/3448016.3457330

Peng, Jinglin; Wu, Weiyuan; Lockhart, Brandon; Bian, Song; Yan, Jing Nathan; Xu, Linghao; Chi, Zhixuan; Rzeszotarski, Jeffrey M.; Wang, Jiannan (June 2021, SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data)

Exploratory Data Analysis (EDA) is a crucial step in any data science project. However, existing Python libraries fall short in supporting data scientists to complete common EDA tasks for statistical modeling. Their API design is either too low level, which is optimized for plotting rather than EDA, or too high level, which is hard to specify more fine-grained EDA tasks. In response, we propose DataPrep.EDA, a novel task-centric EDA system in Python. DataPrep.EDA allows data scientists to declaratively specify a wide range of EDA tasks in different granularity with a single function call. We identify a number of challenges to implement DataPrep.EDA, and propose effective solutions to improve the scalability, usability, customizability of the system. In particular, we discuss some lessons learned from using Dask to build the data processing pipelines for EDA tasks and describe our approaches to accelerate the pipelines. We conduct extensive experiments to compare DataPrep.EDA with Pandas-profiling, the state-of-the-art EDA system in Python. The experiments show that DataPrep.EDA significantly outperforms Pandas-profiling in terms of both speed and user experience. DataPrep.EDA is open-sourced as an EDA component of DataPrep: https://github.com/sfu-db/dataprep.
more » « less
Full Text Available
Silva: Interactively Assessing Machine Learning Fairness Using Causality

https://doi.org/10.1145/3313831.3376447

Yan, Jing Nathan; Gu, Ziwei; Lin, Hubert; Rzeszotarski, Jeffrey M. (April 2020, CHI '20: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems)
null (Ed.)
Full Text Available

Search for: All records